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Abstract. Global sensitivity analysis of a numerical code, more specifically estimation of Sobol indices 
associated with input variables, generally requires a large number of model runs. When those demand 
too much computation time, it is necessary to use a reduced model (metamodel) to perform sensitivity 
analysis, whose outputs are numerically close to the ones of the original model, while being much faster 
^u • to run. In this case, estimated indices are subject to two kinds of errors: sampling error, caused by the 

^^ ' computation of the integrals appearing in the definition of the Sobol indices by a Monte-Carlo method, 

r^ , and metamodel error, caused by the replacement of the original model by the metamodel. In cases 

where we have certified bounds for the metamodel error, we propose a method to quantify both types 
of error, and we compute confidence intervals for first-order Sobol indices. 
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Resume. L'analyse de sensibilite globale d'un modele numerique, plus precisement I'estimation des 
fvj , indices de Sobol associes aux variables d'entree, necessite generalement un nombre important d'executions 

^ ■ du modele a analyser. Lorsque celles-ci requierent un temps de calcul important, il est judicieux 

^S) ' d'effectuer l'analyse de sensibilite sur un modele reduit (ou metamodele), fournissant des sorties 

^^ , numeriquement proches du modele original mais pour un cout nettement inferieur. Les indices estimes 

10 ' sont alors entaches de deux sortes d'erreur: I'erreur d'echantillonnage, causee par I'estimation des 

CO ' integrates definissant les indices de Sobol par une methode de Monte-Carlo, et I'erreur de metamodele, 

r~^, . liee au remplacement du modele original par le metamodele. Lorsque nous disposons de bornes d'erreurs 

C^ ' certifiees pour le metamodele, nous proposons une methode pour quantifier les deux types d'erreurs et 

fournir des intervalles de confiance pour les indices de Sobol du premier ordre. 
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1. Context 

1.1. Monte-Carlo estimation of first-order Sobol indices 

Let Y ~ f{Xi, . . . , Xp) be our (scalar) output of interest, where the input variables Xi, . . . , Xp are modelised 
as independent random variables of known distribution. For i = 1, . . . ,p, we recall the first-order Sobol index: 

_ Var(E(y|X,)) 
Var(r) 

which measures, on a scale of to 1, the fraction of the total variability of the output caused by the variability 
in Xi alone. 

As / is generally implicitly known (/ can e.g. be a functional of a solution of a partial differential equation 
parametrized by functions of the input variables Xi, . . . ,Xp), one has no analytical expression for 5*^ and has 
to resort to numerical estimation. The variances in the definition of Si can be expressed as multidimensional 
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integrals over the input parameters space. We use Monte-Carlo estimates for multidimensional integrals: let 
{X'^}fc=i_..._jv and {X."'}k=i,...,N be two independent, identically distributed vector samples of X = (Xi, . . . , Xp). 
For fc = 1, . . . , A^, we note: 

y, = /(X^-) and y', = f{X[\ ..., Xt^.X^K^^, ..., X'^^). 

We take the following statistical estimator for Si, introduced in [5]: 

1 Y^Af ' _ Z' 1 Y^^ ^ / 1 Y^^ ' ^ 
Si{£) = -"2 

w T.k=i ivkf - [w J2k=i Vk) 

where £ = {{'^''}k=l,....N^ {^"^}k=i,...,N) is our couple of samples the estimator depends on. 

1.2. Reduced basis metamodels 

In order to apply the reduced basis metamodclling, wc further assume that the output /(X) depends on a 
function u(X) where, for every input X, m(X) satisfies a X-dependent partial differential equation (PDE). 

To make things clear, let us consider an example: we take p = 2, so that X = (Xi,X2), and take for 
u(i, X] Xi, X2) the solution of the following {Xi, X2)-dependcnt initial-boundary value problem (viscous Burgers' 
equation): 

u{t = 0, x) = Uom + 5 sin(0.5x) Va; G [0; 1] /, n 

u{t,x = I) = bi 

where our input parameters are (Xi, X2) = {v, uom), and 60 and &i are so that we have compatibility conditions: 
bo — Uq^ and 61 = Uq^ + 5sin(0.5). 

This problem can be analyzed by means of the Cole-Hopf substitution (see j4] for instance), which turns 
the equation of interest into heat equation, leading to an integral representation of u and well-posedness for 

«eco([o,r],iJi(]o,iD). 

Note that the x symbol denotes the spatial variable u(X) depends on, and is unrelated with the parameters 
Xi and X2. Our output can be, for instance: /(X) = J^ J^ u(i, cc, X) dec dt. 

For a given value of X, u(X) is generally approximated using a numerical method, such as the finite-element 
method. These methods work by searching for u(X) in a linear subspace of high dimension Af; this leads to 
a large linear system (or a succession of linear systems) to solve for the coefficients of (the approximation of) 
u(X) in a fixed basis of X. This gives what we call the "full" discrete solution, that we denote again by u(X). 
Even if efficient methods have been developed to solve the linear systems arising from such discretizations, the 
large number of unkowns that are to be found is often responsible for large computation times. 

The reduced basis method is based on the fact that M has to be large because the basis we expand u(X) 
in does not depend on the PDE problem that is being solved; hence it is too "generic" : it can represent well a 
large number of functions, but allows much more degrees of freedom than wanted. We split the computation 
into two phases: the offline phase, where we seek a "reduced space" , whose dimension n is much smaller than 
A/", and which is suitable for effectively representing m(X) for various values of the input parameter X; and 
the online phase, where, for each required value of the input parameters, we solve the "projected" PDE on the 
reduced space. 

This method is interesting if we are to solve the PDE for a number of values of the parameter sufficiently 
large so that the fixed cost of the offline phase is cancelled by the gain in marginal cost offered by the online 
phase vs. the standard discretization. This is often the case with Monte-Carlo estimations. 
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One crucial feature of the reduced basis method, which we will rely on later, is that it provides a certified 
error bound e„(X,t), which satisfies (||-|| being the usual norm on i^([0, 1])): 

\\u{X;t)~u{X;t)\\ <e„(X,t) VX,Vte [0;T] 

and, of course, e^ can be fully and quickly computed (with a computation time of the same order of magnitude 
than the one for u). This error bound on u can lead to an error bound e on the output: 

/(x)-7(x)|<6(x) vx 

where /(X) denotes a functional of the reduced solution. 

One can turn to [7] for a detailed introduction to the reduced basis method, and to ,5, for the extension to 
the viscous Burgers equation ([T]). 

2. Construction of combined confidence intervals 

2.1. Metamodel error 

For a couple of samples £ = ({X'^}fe=i^...^jv, {X''^}fc=i^...jv), we can use our metamodel output / and our 
metamodel error bound e to compute, for k — 1, . . . , N: 

yk = /(X'^), y', = J{X[\ . . . , Xt,,Xt X'^^„ . . . , X'^^) 

and: 

^{^sj-k\ J ^fv'k \r'k vk \r'k v'k\ 

tk — e(,^ j, Cfc — f\Ai ,.. . , Aj_;^, Aj , Aj^i, . . . , Ap j 

In [B], we show that we can compute rigorous and accurate bounds S™ and S^' depending only on the ykiVki £fc 
and ej, so that: 

sr{s)<s,{s)<st'{s) 

where Si{£) is the (unknown) value of the estimator of Si computed on the couple of samples £. We emphasize 
that, in our approach, the yk and yj, are not observed, as no evaluation of the full model is performed. 

2.2. Combined confidence intervals 

To take sampling error in account, we use a bootstrap procedure (see [I]) on the two bounding estimators S™ 
and S^ . More specifically, we draw TV numbers with repetition from {1, 2, . . . , N}, so as to get a random list L. 
We then get two bootstrap replications by computing 5*™ and 5*^ using the samples couple ({X'^jfcgi, {X."'}keL) 
instead of (^{'K'^}k=i^...^N, {^''^}k=i,...,N)- We repeat those computations for a fixed number B of times, so as 
to obtain B couples of replications S™' , ■ ■ ■ , S™' and S^ '' , . . . ,3^ '' . Now, for a fixed risk level a s]0; 1[, 
let 5™^ and S-f" be, respectively, the a/2 quantile of S^-'\ . . . , S^-^ and 5^^ be the 1 - a/2 quantile of 
S^ ' , . . . ,S/ ' . We take [5"*" ; 5"*"^] as our combined confidence interval for St . This confidence interval 
accounts for both metamodel and sampling error. 

2.3. Choice of sample size and reduced basis size 

Increasing N and/or n will increase the overall time for computation (because of a larger number of surrogate 
simulations to perform if N is increased, or, if n is increased, each surrogate simulation taking more time to 
complete due to a larger linear system to solve). However, increase in these parameters will also improve the 
precision of the calculation (thanks to reduction in sampling error for increased N, or reduction in metamodel 
error for increased n). In practice, one wants to estimate sensitivity indices with a given precision {ie. to 
produce (1 — Q;)-level confidence intervals with prescribed length), and has no a priori indication on how to 
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choose N and n to do so. Moreover, for one given precision, there may be multiple choices of suitable couples 
{N,n), balancing between sampling and metamodel error. We wish to choose the best, that is, the one who 
gives the smallest computation time. 

On the one hand, we evaluate computation time: an analysis of the reduced basis method shows that the 
most costly operation made during a call to the metamodel is the resolution of a linear system of n equations; 
this resolution can be done (e.g., by using Gauss' algorithm) with 0{n^) operations. This has to be multiplied 
by the required number of online evaluations, i.e. the sample size N. Hence, we may assume that computation 
time is proportional to A^ x n'^. 

On the other hand, the mean length of the (1 — a)-level confidence intervals for Si, . . . ,Sp can be written 
as the sum of two terms. The first, depending on N, accounts for sampling error and can be modelled as 
-2^ , where -^ is the standard deviation of Si and Qa is an appropriate a-dependent quantile of the standard 

gaussian distribution. The assumption of 1/yN decay is heuristically deduced from central limit theorem. 

The second term, which accounts for metamodel error, is assumed to be of exponential decay when n increases: 
C/a"", where C > and a > 1 are constants. This assumption is backed up by numerical experiments as well 
as theoretical works [Ij . 

We now wish to minimize computation time while keeping a fixed precision p: 

Find (TV*, n*) = argmin n^ x TV so that -^ H == p. (2) 

(Af,n)GR+xR+ VN a" 

The resolution of this problem is an elementary calculus argument. The solution involve the parameters C, 
a and a, which can be fitted against confidence interval lengths found during a "benchmark run". 

3. Numerical results 

3.1. Target model 

Our underlying model is given by the Burgers equation ([T]). Our output functional is taken to be: /(X) = 

We set Af = 60, At = .01, T — .05, while the uncertain parameters v and uom are assumed to be of uniform 
distributions, with respective ranges [1; 20] and [—0.3; 0.3]. We also take B — 300 bootstrap replications and a 
risk level a = 0.05. 

Note that more flexible parametrizations of right-hand sides in ([T]) can be considered; results remain quali- 
tatively the same. We chose this parametrization for simplicity reasons. 

3.2. Convergence benchmark 

Figure [1] shows the lower S*™ and upper 5'*^ bounds for different reduced basis sizes n and fixed sample 
of size N = 300, as well as the endpoints of the combined confidence intervals. This figure exhibits the fast 
convergence of our bounds to the true value of Sa as the reduced basis size increases. We also see that the part 
of the error due to sampling (gaps between confidence interval upper bound and upper bound, and between 
confidence interval lower bound and lower bound) remains constant, as sample size stays the same. 

3.3. Comparison ^vith estimation on the full model 

To demonstrate the interest of using sensitivity analysis on the reduced model, we computed the combined 
intervals for the two sensitivity indices using sample size A^ = 22000 and n = 11 (those parameters are found 
using the procedure described in Section [^31 for a target precision p = 0.02). We found [0.0674128; 0.0939712] for 
sensitivity index for v, and [0.914772; 0.926563] for sensitivity with respect to wom. These confidence intervals 
have mean length: 0.019 « 0.02 as desired. This computation took 58.77 s of CPU time to complete (less than 
1 s being spent in the offline phase). 
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Figure 1. Convergence benchmark for sensitivity index of v. We plotted, for a fixed sample 
of size N = 300, estimator bounds 5™ and S*^^, and endpoints of the 95% combined confidence 
interval, for different reduced basis sizes. 

To obtain a result of the same precision, we carry a simulation on the ^ull model, for N — 22000 (sample size 
can be chosen smaller than before, as there will be no metamodel error); we get a bootstrap confidence interval 
with mean length of w 0.0193 (we can only provide a confidence interval, as the exact values of the sensitivity 
indices are not known in this case). This computation takes 294 s of CPU time to complete. Hence, on this 
example, using a reduced-basis model roughly divides overall computation time by a factor of 5, without any 
sacrifice on the precision and the rigorousness (as our metamodel error quantification procedure is fully proven 
and certified) of the confidence interval. We expect higher time savings with more complex (for example, two- 
or three-dimensional in space) models. 

This work has been partially supported by the French National Research Agency (ANR) through COSINUS program 
(project COSTA-BRAVA no ANR-09-COSI-015). 
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